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1 Introduction 



In this chapter I will discuss the advantages, in principle, of using nonclassical 
states [1] in communication and measurement situations involving classical in- 
formation transfer. Most of the discussions will be concerned with the quantum 
states of light, in particular, the quadrature squeezed states and number states. 
Thus, optical terminology will be freely employed even though the principles 
are generally applicable to fermions also, and gravitational wave detection by a 
free mass will also be treated. I shall focus on the general theoretical concepts 
and principles underlying such applications of nonclassical states without exten- 
sive mathematical derivations, and also no review of the physics involving these 
states which are covered elsewhere in this book. I shall mostly avoid precise 
mathematical definitions and formulations, although the treatment is as precise 
as most standard treatments in theoretical physics or engineering science. 

It is, of course, the defining characteristic of a quadrature squeezed state 
that the quantum fluctuation in one of its quadrature is reduced below that of 
a coherent state. Let la) be a coherent state (CS) of an optical field mode with 
photon annihilation operator a = {x + iy)/2 so that 

{Axf = (Ay)^ = 1. (1) 

In a two-photon coherent state (TCS) [2] l^va), which are the pure quadrature 
squeezed states, one obtains with a proper choice of quadrature 

{Ax')' = (ImI - (ax«+-/2)' = (1^1 + (2) 

Since — = 1, is a minimum uncertainty state on x' , x'^^^"^, 

{Ax')' (^Ax'+^/^y ^1. (3) 

For simplicity, let ii,i>,a be real and |/^ — < 1, thus 

(a|i;|a) = 2a = (fiiyalxl^iya) (4) 

(a| {Ax)^ \a) > iiiiyal (Ax)^ \iiva). (5) 



This is often taken to mean that in the proper quandrature, a TCS is less noisy 
than a CS and so is better for communication and measurement. However, 
(||)-(||) is not a proper justification of such an assertion. 

First of all, the states \^va) and |a), which is j/iz^a) with v = Q, have different 
energy, 

{^va\a) a\^i'a) = \a\^ -f (6) 

It is not a priori clear that if a portion of the energy associated with the mean 
field a is moved to increase {Ay)^ so that (Ax)^ is less, the overall effect is 
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beneficial. Assuming that a signal-to-noise ratio (SNR) criterion is appropriate 
for the present illustration, 



SNR EE 




{Axf 



(7) 



it was shown [3] that TCS indeed maximizes under the constraint of a fixed 
energy for an arbitrary state, (a^a) < S, with the result 



for V — Sj v25+T as compared to SNR|„^ = 45. Secondly, in communication 
with I a), both quadratures can be used to carry information and thus may yield 
a higher capacity than the use of \iivoi) with only one quadrature, which is 
equivalent to using half of the available bandwidth. It turns out that for the 
unrestricted capacity [4] , and much more so for the binary signaling capacity [5] , 
the use of |/iz^a) does lead to improvement over \a). The relevant communication 
concepts and further details are to be discussed in the sequel. The point here 
is that the advantage of l/z^a) over \a) is not as obvious or intuitive as it may 
first appear. Similarly, while number states |n) and direct detection produce 
a noiseless system, it is discrete as compared to the in-principle continuum of 
states I a). Again, it is not a priori clear that |n) would lead to a higher capacity. 

The real point involving nonclassical states, I believe, is the following. His- 
torically or typically in physics, one analyzes a given physical phenomenon and 
sees if it can be useful in application, whereas in engineering one often synthe- 
sizes to produce something to perform a certain function efficiently. (This oppo- 
sition between analysis and synthesis is, of course, neither absolute nor pervasive 
in physics versus engineering.) For a long time after the laser was invented, the 
ideal laser state was supposed to be a coherent state, a quantum source one has 
to live with. Thus, all practical light sources were supposedly characterized by 
classical states, i.e., pure coherent states or their random superposition. How- 
ever, states which are not classical, the nonclassical states, are clearly possible 
to have, at least in principle. In a synthesis or optimization approach, one would 
want to find out whether such states could lead to a better system for the ap- 
plication under consideration. Thus, the following questions suggest themselves 
in any given problem situation: What are the appropriate performance crite- 
ria and resource constraints? What are the best states or state-measurement 
combination one should use according to the criteria and the constraints? How 
much better are they compared to the conventional or standard system? The 
above discussion surrounding and furnishes an example of answers to 
such questions. Typically, the answer would involve quantities that are only 
specified mathematically, such as a TCS. If it seems worthwhile to develop such 
new systems, further questions on concrete physical realizations would have to 
be addressed. In these days of "quantum information" , such questions are even 
more pervasive and important. 



SNR|^,,) -45(5+1) 



(8) 
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Figure 1: Schematic representation of a classical communication system: for U 
and V capitals denote random quantities, lower case their samples but no such 
distinction is made for X^'") and X'^°"*\ 



In the following section, I will review some basic concepts in classical commu- 
nication, distinguish communication from detection, and discuss how physical 
measurement fits into both. In Section ^ the issues of quantum communica- 
tion for classical information transfer will be explained. [Note that "quantum 
information" is entirely outside the scope of this chapter.] I will discuss the 
information capacity of nonclassical states, and the apparently only possible 
useful application of nonclassical states in fiber optic communication, to date 
— the use of nonclassical amplifiers and duplicators. In Section ^, I will dis- 
cuss the use of nonclassical states in physical measurement problems, and the 
communication theoretic limit on the accuracy of measurements. In Section |^ 
the validity of the standard quantum limit for monitoring free-mass positions is 
addressed. Throughout I will try to explain the intuitive relevance of the var- 
ious basic communication parameters, to highlight the main ideas with careful 
formulation but minimum details, and to dispel a few common misconceptions. 
Some results are also presented here for the first time. 



2 Classical communication and measurement 
2.1 Classical Information Transmission 

For our purpose, a classical communication system can be schematically repre- 
sented by Fig. |l|. A source generates a classical quantity u, which is a member 
of an alphabet set U, u €U, which may be discrete or continuous. Since u is 
generated probabilistically according to some distribution, the corresponding 
random variable is denoted by U. The transmitter modulates u onto a sig- 
nal u), which is a time- varying classical function. The channel, which 
usually represents all the disturbance in the system from source to destination, 
yields an output u) statistically related to the input X'^^"-'>(t,u). The 
receiver processes X^°"*\t, u) to produce an estimate w e U of u to satisfy the 
performance criteria. 

If U is a finite set {1, ■ ■ ■ , M}, the criterion of error probability is often 
employed. If U is continuous, the mean-square error between U and V is of- 
ten taken as the criterion. In both cases the system is designed, subject to 
whatever constraints under consideration, to minimize the error or to produce 
a sufficiently small error. In a communication situation, one has joint design 
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over the transmitter and receiver whereas in a detection situation, one is con- 
cerned only with the receiver design. Thus, in communications one may pick 
to influence and in detection one is faced with a given 

statistical description of u). Clearly, communication is broader than 

detection. In the communication case it is important to deal explicitly with 
the time-sequential nature of the source output, with u regarded as a sequence 
Ml, U2, • ■ • ,Ui, - ■ • with corresponding xj^^"'\t, Ui) and Ui). 

The system constraints in both cases are similar. The physical transmission 
medium (and often together with the unavoidable disturbance in the receiver 
structure) specifies the channel representation, the statistical relation between 
X^'^"-\t, u) and u). Constraints on the channel typically include all the 

physical limitations on the transmitter, the medium, and the receiver. They 
usually include a power or energy limitation on u), a total time T and 

a total bandwidth W available for transmission and reception. In addition to 
small error, the system objectives include moderate implementation complexity, 
which is not always easy to quantify, and also large data rate in the case of 
communication . 

The concept of data rate or information rate is fundamental in commu- 
nication. It is usually measured in bits per second, or bits per use which is 
immediately converted to bits per second when multiplied by uses per second. 
For a data source generating one of M equiprobable messages per T seconds, 
the data rate R is defined to be 

(log, M)/T. (9) 

This definition explicitly indicates that it is the number of message possibilities 
that characterizes the rate of a source. It immediately shows why one can have 
more than one bit per photon. Indeed, one can have an infinite number of 
bits per photon if that photon can fall into, say, one of an infinite number of 
different time slots. For a general statistical source, the Shannon entropy H for 
the source is used, in bits per use of the source or bits per source symbol. A full 
description of communication, information, and detection theory can be found 
in [6]-[9]. In the present treatment, only some significant relevant points would 
be highhghted. 

The concept of data rate already forces upon us a fundamental discrete 
view of nature in any realistic physical process. If one can assess a true contin- 
uum, or indeed a true discrete infinity (in communications the word "discrete" 
often means discrete and finite), one would be able to get infinite data rate, 
e.g., when one can distinguish the real numbers between and 1 with infinite 
precision. In reality, a continuum can support only a finite number of bits either 
from unavoidable disturbance or from the laws of quantum physics. A discussion 
of certain points relating to this finite/infinite dichotomy can be found in [10]. 
Here I would like to emphasize that communication is inherently a finite (dis- 
crete, digital) process. Any continuous quantity would finally appear in some 
discrete fashion in actual utilization. 

Not surprisingly, the desirable goals of large data rate and small error prob- 
ability are in conflict with each other. It is easy to see from the law of large 
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number that if one slows down the data rate, say by repeatedly sending the 
same message, one can decrease the error probability, indeed to zero asymp- 
totically but with the data rate also going to zero. What is not obvious, but 
given by Shannon's channel coding theorem, is that for a fixed channel rep- 
resentation, there is a nonzero rate called channel capacity below which one 
can transmit with arbitrarily small error probability by using increasingly long 
codes. A (channel) code is a signaling scheme in which all the signaling sym- 
bols in a sequence over many uses are processed simultaneously, which clearly 
makes the implementation more complex. However, long sequences have the 
statistical regularity given by the probabilistic description similar to the law of 
large number, which, e.g., implies that in a long sequence of fair coin tosses 
there is roughly one half heads, compared to nothing that can be said which 
is applicable to a single or a few tosses. It is this statistical regularity in long 
sequences that leads to the possibility of vanishingiy small error probability with 
a nonzero rate as given by Shannon's theorem. 

The maximum such nonzero rate under whatever constraints and specifica- 
tions on a channel is called the capacity of that constrained or specific chan- 
nel, and is equal to the mutual information between the channel input and 
output. Referring to Fig. |^, we will later discuss the time-varying signal as- 
pect but for the moment consider just channel input X^™) and output X^°'^*^ 
from alphabets X and Y with the channel specified statistically by the con- 
ditional probability p(X(™*)|X(^")), S Y, X^™) e X, interpreted as a 
probability density or probability mass according to whether the alphabet is 
continuous or discrete. With an input probability p(X ('")), the joint probabil- 
ity p(X(°"*),X(^")) = p(X(°''*)|X("))p(X(™)) completely specifies the channel 
action and the mutual information /(X; Y) is defined by, in the continuous case 



/r)( I 
X(°"*)) log ^ — — ^dX('")dX(°"*), 
p(X(™)) 



and similarly, in the discrete case, 



(10) 



(11) 



The Shannon entropy i?(U) of a single random variable U can be defined as 
average self information, or 



HiU) = -J p(u)\ogp{u)du, (12) 
H{U) = -^p(u)logp(w) (13) 

u 

in the continuous and discrete case. Note that (|l^) is always nonnegative while 
( p^ can be negative. Shannon's source-channel coding theorem and its converse 
[7,9,12] state that successive independent samples of a discrete U can be trans- 
mitted over a memoryless channel p(X(°"*) jX'*"^) with arbitrarily small (but not 
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exactly zero) error probability between U and V (see Fig. |) if H{U) < I{X- Y), 
and the block error 1 for H{U) > IiX;Y). The case H{U) = I{X;Y) 

forms a boundary with Pg bounded away from zero in general unless the chan- 
nel is noiseless. It is important to observe the conceptual distinction between a 
source output U and a channel input X, even though they may happen to be the 
same physical quantity. Note also that a continuous alphabet channel in reality 
still has a finite capacity and so can reliably transmit only a discrete quantity. 
If ?7 is a continuous random variable, some performance criterion such as mean- 
square error would need to be adopted which cannot be made vanishingly small. 
The extent to which it can be minimized is dealt with in rate-distortion theory 
[7-13] discussed in Section ^. It is important to note that for a noisy channel, 
the use of long codes to obtain a reliable system with high rate significantly 
increases the system complexity, especially in the decoding operation. 

The name "capacity" is usually applied to the I{X;Y) maximized with re- 
spect to under whatever constraints, but it is also used to refer to 
whatever maximum I{X; Y) obtained by different restrictions on the utilization 
of a given channel, e.g., under discretization (usually called quantization in the 
communication and signal processing literature) of the input and output of a 
continuous channel. The point is that with various special restrictions including 
a fixed p(X^™^), a given channel would give rise to many other channels, each 
with its own "capacity." Even more proliferation occurs in the quantum case. 
It is essential to understand the exact conditions under which a so-called "ca- 
pacity" is obtained, for it is often not a really meaningful capacity in the sense 
of ultimate capability limit on the transmission medium or system. 



2.2 Signal, noise and dimensionality 

I will now try to describe qualitatively the effect of noise on data rate, finally 
leading to the famous Shannon capacity formula for an additive white Gaussian 
noise (AWGN) channel which is directly applicable to squeezed states. Let 
P and be the total average signal and noise power of an AWGN channel 
represented by 

Xi°^*){t)^X^'"'>{t)+n{t), (14) 

where n{t) is the white noise. Let W be the available bandwidth, i.e., the 
duration in frequency occupied by the signals X(*"^(t). Then the optimizing 
input signals for capacity is a white Gaussian process with resulting 

C = W\og{l + ^). (15) 

In terms of the noise spectral density A^O: one has the famous formula 

C = Wlos{l + ^). (16) 

Equation ( p^ ) can be derived from the mutual information expression (|l^), as 
given by Shannon [11] and later more rigorously in [7]. The intuitive reason why 
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L5[ ) takes the form it does, according to such a derivation, would then have to 
be traced through the reason why ( p^ ) or ( pi] ) provides a general capacity for 
information transmission. In the discrete case (|l^), this can be gleaned from 
Shannon's original proof in [11], and the continuous case may be viewed as a 
discrete limit as developed in [7]. However, a direct approach can be given for 
a Gaussian channel, also provided by Shannon [13], which explains the nature 
of various relevant quantities quite succinctly. 

Consider the transmission and reception of a single continuous real variable 
in noise 

If X^*") is restricted to an interval of length L, an infinite number of bits per 
channel use is obtained in the absence of noise, n = 0, for any L > 0. If the 
noise n always has value in the interval [—A/2, A/2], the number of bits per use 
is reduced to the finite 

(L + A)/A (18) 

including edge effects. If X and n are independent continuous random variables 
with variances P and N, or standard deviations \/P and \/N, a crude estimate 
patterning after (|l8|) would suggest that the number of amplitudes that can be 
well distinguished, or equivalently the number of bits per use, is 

~ k^{P + N)/N, (19) 

where A: is a small constant in the neighborhood of unity depending on how "well 
distinguished" is to be interpreted. We may recall earlier in this section it was 
mentioned that in a long sequence of independent trials, statistical regularity 
appears and provides deterministic features to the sequence. This kind of effect 
would indeed turn the approximate relation (|l9|) into an exact one similar to 
(p^. In the case of time- varying signals, this comes about in the long signal 
duration T limit as follows. 

First of all, the collection of time functions of "approximate time duration" 
T and "approximate bandwidth" W span a linear space of dimension 

D - 2TW (20) 

according to the Dimensionality Theorem [6], an improved version of the sam- 
pling theorem [13]. The word "approximate" above is necessary because no time 
function with a Fourier transform can be both strictly time-limited and strictly 
band-limited, but the exact definitions of "approximate" do not alter the final 
result ( [20| ) [14]. The Dimensionality Theorem (|20|), as I discussed elsewhere 
[15], has momentous consequences in the description of nature. Here, it cuts 
down, even in the absence of noise, an otherwise infinite dimensional space to 
a finite dimension in a realistic system where both T and W have to be finite. 
Thus, a signal or time function can be viewed geometrically as a point in a finite 
dimensional Hilbert space. (The linear space is readily given an inner product 
via. f^x['"\t)xt\t)dt .) 
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Figure 2: Geometric representation of the receiver: four possible signals 1 to 4 
in a 2-diniensional space with corresponding minimum distance decision regions 
I to IV formed by the dotted lines. The additive white Gaussian noise vector ni 
added to signal 1 would be decoded correctly, while the noise vector n2 pushes 
signal 1 to the decision region IV for signal 4, and would be decoded incorrectly. 



In this geometric representation, the effect of an additive noise is to add a 
noise vector to the signal vector. The effect of a power constraint P on X^"^^t) 
is to have it lie inside a _D-dimensional sphere of radius \/P in the whole space 

, the Euclidean space of dimension D. For white Gaussian processes, the 
coefficients of its expansion in any orthonormal basis are independent Gaussian 
random variables, with variances all given by the same quantity, the average 
power of the process [6-8,13]. The receiver looks at the received point A in 
the signal space, and picks the nearest signal point in Euclidean distance to 
A for minimizing the error probability assuming equiprobable messages. The 
situation is illustrated in Fig. ^. 

Thus, in time T there are 2TW independent Gaussian amplitudes from (pO|), 
and from (19) the total number of well distinguished signals is 



M 



P + N 
N 



2TW 



(21) 
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The number of bits per second is, from (g), 

The capacity formula (^5|) for AWGN channel follows from (^) with k = 1. 
It comes about more precisely as follows. As a result of the statistical regu- 
larity mentioned above, for large T the signals A"'^*"^(t) must almost all lie on 
a sphere of radius \/2TWP, signal plus noise on a sphere of radius 

^/2TW{P + N), with noise n{t) on a sphere of radius \/2TWN centered at 
the original signal point. Note that the Euclidean structure of the signal space, 
which is absent in the general discrete case, is crucial here — the different coor- 
dinate values of, say, n{t) in the D-dimensional space are Gaussian distributed 
yielding the given average value DN for the norm Jj, 'n?{t)dt of the n{t) vector, 

so that n{t) is on a sphere of radius VDN around the source signal point. For 
arbitrarily small error probability, one would want the noise spheres around dif- 
ferent signal points to overlap arbitrarily little. A "sphere-packing" argument 
(see (|35| ) below also) then readily establishes the converse to the coding theorem 
for (|15[), namely that it is impossible to transmit with arbitrarily small error 
probability at rates above C. For the positive statement that it is indeed pos- 
sible to so transmit at rates below C, a "random coding" argument is required 
which in fact establishes the following amazing result: if the signals are selected 
at random, with probability one the resulting error probability is arbitrarily 
small. The dichotomy at C, for all rates R below C the block error Pg ^ 
for almost all long codes (n — > oo) while for rates above C, Pe ^ 1 for almost 
all long codes, is exactly like a phase transition. In practice, it turns out that 
long codes or signal sets that have enough structure to be readily described, 
encoded and decoded, do not approach capacity although the situation seems 
to be changing very recently. For more details of the above description see [6] 
and [13]. 

Besides communication of information, the problem of estimating a continu- 
ous entity is also of prime concern in this chapter. Consider a Gaussian random 
variable U with zero mean (or normalize it away) and variance a^, which is 
received in the form Au in Gaussian noise with a possible gain or loss A, i.e., 

Xio^t) =Au + n. (23) 

This may arise in linear modulation, or in the estimation of U in any experiment. 
(The word "detection" is commonly reserved for the "estimation" of a discrete 
U.) If the mean-square error between the estimate V = ?i(X(°"*') and U 
is to be minimized, the best estimate is given by [6,8] the conditional mean 

£;[c/|x(°"*)], 

Y'{out) I A 

where TV is the noise variance, with resulting 
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in terms of the a priori variance and a signal-to-noise ratio S/N . 

In the physics hterature on quantum information and its apphcations, the 
criterion of mutual information is often used in place of detection or estimation 
error in situations (such as cryptographic eavesdropping) for which no coding 
is possible. Depending on the problem, the best possible outcome of such use 
would be a bound rather than the desired performance criterion. 

2.3 Communication versus measurement 

The estimation/detection problem clearly parallels the problem of producing an 
estimate of a desired quantity u from the measured data X'^°^*-'^ in a physical 
experiment. More generally, in an estimation problem one is given a fixed 
statistical specification X(°"*^(t, u) and forms in general a nonlinear estimate 
m(X(°"*)) so that a cost function C(C/, U) is minimized. For mean-square error, 

C{U,U)^l^ = n\U-lJ\% (26) 

where the expectation E is taken over all the random quantities involved. In a 
communication situation, the channel is a statistical transformation F on the 
input X(™)(t, u) 

(i, u) = (t, u)] , (27) 

which reads, for an additive noise channel, 

=X(*")(<,u) + n(i) (28) 

for which one can control X*^*")(t, u) subject to the system constraints. In con- 
trast to the estimation direct optimization approach to a communication 
problem with joint transmitter/receiver optimization has never been developed 
in a useful way. Instead of asking for the optimum system for a fixed time du- 
ration T, T itself is floated as a design parameter in the development of channel 
encoding-decoding design subject to Shannon's coding theorems. 

It can be seen that a physical measurement is generally not just an estima- 
tion problem, because X*^°"*)(i, u) can be influenced to some extent through 
the choice of the physical measurement process, although perhaps not as much 
as controlling in (p7|). In particular, it is a major part of the mea- 

surement system design to find an appropriate physical variable X^'"^ to couple 
to the desired information parameter u to form X'-™)(t,M) for information ex- 
traction after the corruption of X'^™'(t,u) to X'^°"*^(f,u) by the "channel" is 
taken into account. However, there is usually no question of data rate in a 
measurement. Thus, physical measurements, which are of prime concern to 
us, are described somewhere between communication and detection/estimation. 
This situation already obtains in classical measurements, and in cases of fixed 
quantum states and quantum measurements. It becomes more so in quantum 
communications where the quantum states and quantum measurements can be 
freely chosen. As developed in Section ^, the feasibility of choosing quantum 
states moves a physical measurement problem away from being a pure estima- 
tion problem to becoming more like a communication problem, although it never 
fully becomes a standard communication problem. 



12 



SOURCE 



QUANTUM 
TRANSMITTER 



X(t) 



quantum 
measurement 



Y(t) 



p(u) 


CHANNEL 


Pr(u) 


QUANTUM 
RECEIVER 




— 

X(t,u) 


— ^ 

Y(t,u) 





pR 



DESTINATION 



Figure 3: Schematic representation of a quantum communication system: the 
message dependence generally enters through the state for the field, but can be 
put in the field operator itself in some instances. 



3 Quantum communication 

3.1 Quantum Versus Classical Communication 

By "quantum communication" we mean more than the study of quantum effects 
in communication systems involving classical information transfer. Specifically, 
in quantum communications we are concerned with the system performance 
under a variety of different quantum measurements and quantum states. Refer- 
ring to Fig. |l|, the statistical specification of the channel plus transmitter, e.g., 
is given by a conditional probability The classical variable X*^""*^ 

may well be of quantum origin, say it is the eigenvalue of a quantum observable 
obtained in a measurement. However, as far as the analysis of this system is con- 
cerned, the fact that arises from quantum mechanics makes no dif- 
ference, and it would proceed just like a classical communication system. If this 
arises from a quantum state p{u) and a quantum measurement of a 
selfadjoint X^""*) with eigenstates = 

one may well ask whether other possible choices of p{u) and observable with re- 
sulting different may lead to better performance. These additional 
freedoms of quantum measurement and quantum state selection are absent in a 
classical communication system. They constitute the new content of quantum 
communication. 

A general quantum communication system is depicted schematically in Fig. ^. 
The channel input and output signals and X^°"*'>{t) are now field op- 

erators in quantum states p{u) and pji[u). 

Generally, as indicated by the Dimensionality Theorem (po[), a finite number 
of modes each with two degrees of freedom, such as an optical mode with two 
quandratures, would suffice so that the density operators p and pR are well- 
defined. A specific classical statistical characterization of the system would 
result upon a choice of quantum measurement at the receiver. The most general 
characterization of a quantum measurement is the so-called "completely positive 
operation measure" with the corresponding measurement statistics given by a 
positive operator- valued measure (POM) [16]. Let 0{X^°^^'>) be a POM on 
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the channel output, thus 
^0(X(°"*)) = / or /" 0(X(°"*V^^°"*^ = (29) 

and each 0(X(°"*^) is a nonnegative selfadjoint operator, with 0{X'^°^^^) = 
|^(o«t))(x(°"*)| for orthogonal in the case of a selfadjoint observable. 

The statistics are given by 

= trp;vXM)0(X (30) 

The output state pr{u) is determined by the channel action on the input state 
p{u). The additional quantum "freedoms" in quantum communication consist 
in the selection of 0(X(°"*)) and p{v). Note that it may be more convenient, 
as in the case of frequency modulation, to enter the information variable u in 
in parallel with the classical case, and specify the transmitter in terms 
of and p rather than and p{u). In this formulation, the 

quantum state p and the classical modulation process are decoupled. However, 
if the information u enters through the quadratures, it would be necessary to 
use p(w), for which the quantum state selection and modulation selection are 
tied together. 

Historically, the serious study of optical communication began immediately 
after the laser was experimentally realized, for which quantum effects are clearly 
important as hw/k ^ K. While the evaluation of system performance went 
on for coherent states and the three standard measurements: direct, homodyne, 
and heterodyne detections, quantum communication theory in our sense was 
also developed. Forney [17] and Gordon [18] proposed the entropy bound for in- 
formation transfer with a fixed set of states valid for arbitrary measurement, to 



be discussed in section 3.3. Helstrom [19] studied the quantum measurement op- 
timization problems in the spirit of classical detection/estimation theory, which 
were further developed by Holevo [20] and Yuen [21]. Each of the above three 
standard measurements corresponds, respectively, to the quantum measurement 
of photon number, single field quadrature, and joint quadratures described by a 
POM but not a selfadjoint operator [22]. In such work, which actually has many 
applications in physics [23] but will not be further discussed in this chapter, the 
states are fixed and the quantum measurement is selected so that the resulting 
classical statistical system leads to the best possible performance compared to 
other measurements. The issue of quantum channel representation was treated 
[4,24,25] and the possibility of receiver state control is suggested [2,4,25,26,27]. 
The general problem of transmitter quantum state selection was considered by 
Yuen [3,4], leading to the development of TCS as indicated in Section |l|. The 
question of optimal state influence on channel capacity was also implicit in con- 
nection with the application of the entropy bound, indicating that number states 
and photon counting are best for free boson fields [4,28]. For recent advances 
in determining the capacities and error exponents of various quantum channels 
by Holevo and the Hirota group, see [29-31]. For other advances including work 
on quantum tomography by the D'Ariano group, see [32-34]. For applications 
of squeezed states to quantum cryptography, see [35, 36]. 



14 



3.2 Mutual information 



The capacities, or mutual informations maximized over the input distributions, 
for various boson channels are discussed extensively in [37] . Here I would like to 
focus on five capacities for the narrowband free boson channel under an average 
power constraint: number state and photon counting, TCS and homodyning, 
coherent state and the three standard measurements. Hopefully, it would be- 
come clear within this and the next subsection that they are the most important 
cases capturing the essence of the situation. 

For the free electromagnetic field at optical frequencies, all the current or 
forseeable future systems are narrowband, i.e., the available bandwidth is only 
a small fraction of the center frequency. Due to various facts of nature, it would 
be extremely difficult and inefficient to utilize photons at higher frequencies, say 
X-rays, in a communication situation. Thus, there is no practical significance in 
studying wideband photonic channels. The constraint of average power can be 
separated into two parts: average with respect to the statistics of the information 
variable U and average with respect to the quantum nature of the state p. In 
either case a peak power (or energy or power spectral density) constraint can also 
be applied. In the case of classical signals the peak power constraint is indeed 
quite meaningful and realistic, but is often hard to handle mathematically and 
usually avoided. In the case of quantum states, a peak energy constraint would 
cut off the Hilbert space of states H eX a, maximum number state eigenvalue 
rim so that H becomes finite-dimensional. This, however, is unrealistic or at 
least hard to handle in so far as one considers a coherent state \a), which has 
components in all |n), to be realizable. Some discussion on this issue is given 
in [10] , although in its full scope it is a complicated and profound issue. Here I 
would advocate, if only on the ground of mathematical convenience, that energy 
constraint is to be applied to the quantum state average trpa^a, and not to yield 
an Tim- 

Let P = hfoWS be the available signal power of a narrowband channel 
of center frequency /o and photon numbers S per mode. The photon number 
capacity is [28,37,38] 



If both quadratures of the TCS are utilized under the same power constraint 
with optimized TCS-hcterodyne [22] or joint quadrature measurement, it can 
be shown from the Kuhn- Tucker optimizality conditions of nonlinear program- 
ming that as S is increased from 1 the optimum capacity is indeed achieved 
through utilization of only one quadrature. The coherent state heterodyne and 
homodyne capacities arc 



Cop = W[iS + 1) log(5 + 1) - SlogS]. 
For TCS with homodyne detection [4,37], 
CTcs = Wlog{l + 2S). 



(31) 



(32) 



Chet = W\og{l + S), 

Chom = yl0g(l-h45). 



(33) 
(34) 
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Equations (32)-(|4|) are easily derived from ( |l5|) because the corresponding chan- 
nels are AWGN ones — the fluctuations in a TCS or a coherent state, which 
is merely a classical amplitude superposed on vacuum, are behaving as inde- 
pendent additive Gaussian noises, and are white noises under the narrowband 
assumption. The coherent-state photon counting capacity Cph does not have a 
simple closed form but is readily computed numerically [39] . These five capac- 
ities are compared numerically in Fig. ^ reproduced from [4] , for a fairly wide 
bandwidth. It may be observed that Ctcs is always larger than Chet and Chom, 
and is also larger than Cph in the case of more than a fraction of a photon per 
mode. 

However, the difference between Ctcs and Chet , Chom is not big. Indeed, 
Ctcs is less than twice Chom although the SNR of TCS is the square of that 
for coherent states. Because the data rate for a mode goes as log(l-t-SNR) from 



16 



L5|), the square in SNR becomes less than a multiplicative factor of 2. The 
difference between Ctcs and Cuet is even less, (|3^) is equivalent to doubling 
the signal power in Chet with the same bandwidth. The underlying reason can 
be understood as follows. In the geometric representation of signal and noise 
sketched in section 2.2, it can be seen that the effect of noise is to move a given 
signal point away from its position. If the noise is big enough, it would move it 
closer to another signal point B as compared to the original point A, and the 
optimum receiver would decide it is this other signal B that was transmitted, 
hence making an error, as illustrated in Fig. |[ Thus, a good system would have 
the signal points as far apart as possible from the viewpoint of errors, and have 
as many signal points as possible from the viewpoint of data rate, two conflicting 
goals. For a fixed dimension D ~ 2WT, a larger power P yields a larger sphere 
and the same number of M signal points can be placed further apart inside 
the sphere, leading to a smaller error for a fixed noise power N. Increasing W, 
however, is more beneficial than increasing P, thus Chet > Chom as W increases, 
even though Chom has a bigger SNR. To see the role of W versus P, recall the 
discussion around ( ^l| ) and ( p^ ) that one wants the noise spheres around different 
signal points to be almost nonoverlapping to yield small error probability. As 
a result of this "sphere packing," the number of well distinguished signals is 
roughly the ratio of the signal plus noise volume to the noise volume. The 
volume VoCr) of a _D-dimensional sphere of radius r is Bjjr^ for a D-dependent 
constant Bo, which implies 

VdWD{P + N)/Vd{VdN) = (1 + (35) 

since the radii of the signal plus noise and noise spheres is ^y2TW{P + N) and 
^/2TWN respectively. The quantity (|3^) grows exponentially in or but 
only to a fixed power in P. This more important role of W versus P clearly 
manifests in ( p^ ) and (|l6|). 

Having understood why the apparent large gain in SNR given by (||) for TCS 
leads only to a small gain in capacity, the question becomes whether TCS would 
be significant in improving optical communications compared to coherent states. 
This rest of this section ^ is devoted to a detailed examination of this issue. We 
may first observe that complicated coding, especially the decoding process, is 
required to approach capacity given by any of (^-(^). If one looks at the 
error behavior of information transfer under specific simple signaling scheme, 
e.g., the antipodal signals discussed in [5], the full SNR square advantage may 
appear. That is, more restricted "capacities" than (p2|)-(p4[) may show a large 
advantage with TCS. In Section 3.3 we will see that the number state capacity 
Cop, which is so close to Ctcs, is actually the optimum rate for any states and 
measurements subject to the average power constraint. This capacity Cop can be 
obtained without the need for complicated decoding because the ideal number 
state channel is noiseless — there is no need to use long sequences to yield 
statistical regularity. Thus, the use of number states can be considered as an 
alternative to channel coding. Number states, as intensity squeezed states, have 
a lot of similarity to TCS in regard to their physical generation and propagation 
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characteristics. Unfortunately, the use of such nonclassical states as information 
sources would not be advisable in practical communication systems. In addition 
to various problems of a more practical nature, such as phase coherence for TCS 
and good detectors for number states, the inevitable presence of significant loss 
would wipe out the advantage of nonclassical states. This issue will be treated in 



section 3.4 after the following discussion of the entropy bound that established 
the optimality of Cop given by (|3l|). 

3.3 The entropy bound 

Given a fixed set of density operators p\ dependent on a discrete or continuous 
random variable A with probability (density) p{X), define 

P=Y,pWpx or I p{\)pxdX. (36) 

Let S{p) = —trplogp be the Von Neumann entropy of p, and let 0(X*^°"*^) be 
the POM giving the measurement probability. Then the mutual information 
between A and X^""*) is bounded by 

IiA;Y)<Sip)-Sipx) (37) 
^(pa) = ^p(A)^(pa) or f p{X)S{px)dX. (38) 

A 

This entropy bound (^^, first given by Forney [17] and Gordon [18], was proved 
for finite discrete A and finite dimensional Hilber space H by Zador [40] and 
independently by Holevo [41], and general A and infinite dimensional H by 
Ozawa [38]. The long complicated history of this bound is outlined in [10]. 
Recently, the inequality in ( |3^ ) is shown to be achievable if the measurement 
can be made over a long sequence of states instead of just symbol by symbol 
in the sequence [42,43]. However, while this may be considered to establish 
the capacity of a quantum channel defined by the mapping A i— > pA, such a 
specification of a quantum channel is neither general nor practical. The main 
reason is that there is no way to tell whether the particular map X i—t px 
is optimum under the constraint of the problem. As we have emphasized in 
section B.l, a general quantum communication problem involves both the choice 



of states and measurements. Under an average energy constraint for a single 
mode, 

^p{X)trpxa''a < S (39) 

A 

one cannot tell a priori what the optimal X t—i- px should be, even if one assumes 
all px are coherent states. 

On the other hand, the bound ( p7| ) in its full generality readily shows [38] 
that for a single boson mode under (|3^), the maximum /(A;Y) is achieved 
by taking A and X^^""*^ as a nonegative integer, with number states px=n = 
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|n)(n| and the / value given by (^Tj) for W = 1. The wideband capacity can 
be similarly derived [38]. Note that apart from showing that more general 
processing such as feedback would not increase /, this joint optimization over 
state/measurement and modulation (the map A i— > pa) demodulation (the map 
j5j^(oiit) ^-j jj^Qgg establish that ( |3l| ) is the ultimate quantum limit on the 
possible rate of information transfer for a boson mode of average energy S. 

3.4 Effect of loss 

The effect of linear loss on a mode can be represented as a transformation on 
the modal photon annihilation operator [2,4,44-46], 

b = r]^a+ {l-i])id, (40) 

where rj is the transmittance and the d-mode is in vacuum. Note that the effect 
of quantum efhciency on a detector can be so represented as well. It follows 
immediately from ( |40| ) that the resulting quadrature fluctuation in b has a floor 
level (1 — T?)/4, which is essentially the coherent state noise level for 77 ^ 1. 
Similarly for a number state, the &-mode photon number fluctuation 

(AN^) = v'm'a) + V{1 - V){Na) (41) 

contains a partition noise equal to the mean {Nh) — 'ri{Na) for ^ 1, washing 
away the sub-Poissonian character of the a-mode. Generally, one can readily 
show from (^0|) that the state pb is very close to a coherent state of mean r]^^^{a) 
for large loss, thus any nonclassical state becomes essentially classical. 

The implication of this fact on the utility of nonclassical states is profound, 
especially in engineering applications where significant loss is usually present, 
e.g., in fiber optic communications. Unless a special environment is created 
[4] to compensate for the squeezing or nonclassical effect in the presence of 
loss, there is no way to keep a nonclassical state at the reception end. While 
this is possible in principle, it seems that is not worth the trouble. Even in 
scientific experiments or in the process of nonclassical state generation, loss 
places a severe limit on the amount of squeezing obtainable. The sensitivity of 
nonclassical states to loss and interference would place strenuous requirements 
on all the system components, making any such system extremely difficult to 
implement. To me, a similar kind of argument leads to a similar implication in 
the field of quantum information. 

Given the close value of Ctcs to Chet in (^)-(|3|) in the absence of loss, it 
should be clear that there is hardly any advantage left in the presence of loss. 
While the ultimate quantum capacity Ci of a lossy channel is not known, an 
upper bound on Ci can be easily derived. Under an average energy constraint S 
and loss 77, equation ( |3l] ) for Cop with S replaced by rjS would provide a bound 
on Ce. The gap between Cop and Chet with rjS is the largest gain, probably 
not actually achievable, that one can possibly obtain with nonclassical states 
in a lossy channel. The smallness of this gap, as seen from Fig. ^ shows that 
there is little significance in pursuing quantum communications with nonclassi- 
cal sources in practice, a conclusion I drew over twenty years ago. 
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3.5 Quantum amplifiers and duplicators 

Not all is lost, however. As to be presently explained, the use of novel quantum 
amplifiers and related devices on coherent state sources can lead to a number 
of significant communication applications not possible with the usual phase- 
insensitive linear amplifier (PIA) . A characteristic feature of these novel devices 
is that their outputs are often nonclassical states for coherent state inputs, even 
though it is not the nonclassical nature of these states that is relevant in the 
application. 

Corresponding to the three standard quantum measurements are three quan- 
tum amplifiers, the photon number amplifier (PNA), the phase-sensitive linear 
amplifier (PSA), and the PIA. If b and a are the output and input modal photon 
annihilation operator of the amplifer, these three amplifiers can be represented 
as [45-48], with a power gain G > 1, 

PIA b = G^^a+CG- l)i/2z;t, [v,v^ = l (42) 
PSA bi = G^/'^ai, 62 = G-i/2^2 (43) 
PNA b^b = Ga^a, G integer (44) 

A fourth quantum phase amplifier [49,50] is 

QPA e+=e^^, e+ = {a^'a + 1)^ , (45) 

which is related to the ideal phase measurement [19,21,51] described by a POM 
involving the Susskind-Glogower states and corresponding phase-coherent states 
[51]. 

Table 1 



DETECTION 


AMPLIFIER 


STATES 


DUPLICATORS 


heterodyne 


PIA 


cs 


BQD 


homodyne 


PSA 


TCS 


SQD 


direct 


PNA 


NS 


PND 


phase (ideal) 


QPA 


PCS 


QPD 



(The column on states merely emphasizes that the nature of these states would 
be preserved only by the corresponding amplifier, not that the amplifier is noise- 
less only for those states.) 
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Abbreviations for Table and Text 



CS coherent state 

NS number state 

TCS two-photon coherent state 

PCS phase coherent state 

PIA phase-insensiive hnear amphfier 

PSA phase-sensitive linear amphfier 

PNA photon-number amphfier 

QPA quantum phase amphfier 

PND photon number duplicator 

POA photon on-ofF amplifier 

QND quantum nondemolition measurements 

POM positive operator-valued measure 

SQL standard quantum limit 

In ( p^ ) and (^), the photon operator b has to be defined on two modes. 
For a fuller discussion of these amplifiers, see [48] and [49] which also contains 
an extensive treatment of duplicators to be discussed later in this section. The 
main point about (^2|)-(Q) is that the amplifier output of each is, for the cor- 
responding measurement, a perfect "noiseless" scaled (amplified) version of the 
input for arbitrary input state, i.e., they are noiseless amplifiers for the cor- 
responding detection scheme. Thus, the often found statement that quantum 
amplifier necessarily introduces noise, say in the sense of having a noise figure F 
= SNRa/SNRh > 1, is wrong. As summarized in Table 1, if the proper amplifier 
matching the measurement is used there is no additional noise ideally, similar 
to the classical case. All the noise then arises inherently from the quantum 
nature of the input. (This is also true in both balanced and unbalanced homo- 
dyne/heterodyne detection for which the effective amplifier, the local oscillator, 
introduces no noise in the high gain limit. See [52]. It is a pervasive miscon- 
ception that the noise in homodyne/heterodyne detection is local-oscillator shot 
noise.) Without going into a detailed exposition, this is actually clear intuitively 
from the basic principles of quantum physics. When you fix a measurement, the 



situation is classical for any given state as discussed in Section 3.1 on quantum 
vs. classical communication, in the sense that a fixed probabilistic description 
is obtained. The situation is a little more subtle in the case of POM rather than 
selfadjoint operator, but can be understood by analyzing the POM as commut- 
ing selfadjoint operators measurement on an extended space which can always 
be done [20]. 

The generation mechanism of PSA is identical to quadrature squeezing, 
which, being piecewise linear, is not exactly a nonlinear effect. On the other 
hand, PNA, QPA and the duplicators involve truly nonlinear quantum ef- 
fects [47-50] which would not be discussed here. None of these new quantum 
devices except PSA has been successfully demonstrated experimentally in a 
useful manner. 

At this point, I would like to address a confusing point about the capability of 
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amplifiers. It is often stated that an amplifier at the receiver could improve the 
receiver performance. The optimum receiver performance is determined by the 
specification {X^°^*^{t), p{u)} in Fig. ^ Nothing, and certainly no amplifier, 
can ever improve that as a matter of tautology. What can be improved is a 
specific receiver structure that does not lead to the optimum performance. In 
such a case, the use of an amplifier or some other device may improve the 
suboptimum receiver performance. This point is related to, but different from, 
the so-called data processing theorem [7] in information theory which shows 
that no processing can increase the information transfer over a channel. 

The above amplifiers can be used as pre-amplifiers to suppress subsequent 
receiver noise in the corresponding detections, in either engineering or scientific 
applications. They can also be used to advantage [53] in the attempt to create 
a transparent optical local area network. For such a purpose, however, the 
duphcators [46-48,54] would be perfect. A photon number duphcator (PND) is 
a device with one input a in state pa and two outputs 6, c such that each of the 
output photon counting statistics is the same as that of the input 

{n\pa\n) = {n\pb\n) = {n\pc\n). (46) 

Typically, the output photon counts for the b and c modes are perfectly corre- 
lated, thus PND also provides a perfect realization of a photon number quantum 
nondemolition measurement (QND) with only a finite energy [47]. Single and 
double quadrature duplicators can be similarly described. 

The amplifiers can be used as line amplifiers in long distance optical fiber 
communications. For example, the use of PSA not only improves the SNR by a 
factor of 2 for coherent state sources in a long amplifier chain, it also significantly 
reduces the Gordon-Haus soliton timing error [55]. Considerable experimental 
progress [56] has been made on such possible application, but the required phase 
coherence renders it impractical. For on-off signals, the use of PNA leads to the 
following error probability 

Pe^^exp{-S[l-UG)]}, (47) 

where the functions /„(G) obey the recurrence relation 

/„+i(G) = (1 - G-'f[l + {G- l)-'UG)f (48) 

with fo{G) — 0. Equations (|4^)-(|4^) apply to a chain of n amplifiers of gain 
G and loss G~^ between two adjacent amplifiers, assuming direct detection. In 
Fig. H, this error exponent 1 — fn{G) is compared with that of the PIA line, j^, 
obtained under the Gaussian approximation for direct detection. 

As can be seen in the figure, even more improvement, in fact the optimum 
improvement, is obtained with the use of a photon on-off amplifier [57] (POA) 
tailored for the situation. In the state description, a POA acts on two modes 
but for the input mode it reads 

|0) ^ |0) 

POA |1> ^ \a) (49) 
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Figure 5: Comparison of error exponents S'~-'^ln 2Pe as a function of stages 
n — the PIA line exponent is independent of S and G, the PNA exponent is 
independent of S and the POA exponent is independent of G. 




where |q;), |0) are the two on-off coherent states, S = |ap. The resulting error 
probability is 

Pe = ^[l-(l-e-^r], (50) 

which is the same as that obtained by a repeater, i.e., by direct detection and 
retransmission at each of the n stages. In general, it is possible to write down 
a perfect quantum amplifier for any given signaling and detection scheme which 
performs as well as a repeater, although the actual installation of POA or any 
such amplifiers in a long line would entail the loss of flexibility, as compared to 
PNA, for adapting to other signaling schemes. 

Quantum amplifiers are also useful in quantum cryptography [48]. A major 
problem of the quantum cryptographic schemes is that they cannot be ampli- 
fied to compensate for the loss without disrupting the operation of the scheme. 
In [58] a new quantum cryptographic scheme is introduced that allows ampli- 
fication, which greatly extends the distance over a fiber for which the scheme 
works. 
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4 Ultimate limit on measurement accuracy 



4.1 Measurement System and Ultimate Performance 

In this section |4| the question is addressed on the ultimate, quantum as weU as 
classical, limit on the measurement accuracy obtainable with various measure- 
ment systems. The optimum performance ideally achievable with a measure- 
ment system is of course an important piece of design information, but more 
importantly I would like to assess the potential of such systems, and ways to 
realize them in principle, in order to explore the feasibility of developing ultra- 
high precision measurement systems important in many applications, especially 
scientific ones. My approach [58] is based on the communication characteriza- 



tion of measurement discussed so far, especially in section 2.3, while adopting 
quantum and classical communication theory to provide the answers. Since the 
correspondence between communication and measurement is not exactly iso- 
morphic, we will find that it is possible to obtain limits on the measurement 
accuracy, but not always possible to be assured that those limits are attain- 
able. Indeed, even if the correspondence is perfect, there are still additional 
questions, such as what systems are actually available, that would resist a com- 
plete mathematical characterization in the forseeable future. Nevertheless, as 
to be discussed presently, some of the results obtained are somewhat surprising. 



and also promising. In the next section 4.2 the rate distortion limit in classical 



communication theory will be explained, and in section 4.2 the corresponding 
quantum limits will be presented. Here I would like to first highlight the results 
and their implications. 

The final error in a measurement system may depend, even in principle 
excluding nonideal environmental perturbations, on more than a single source 
or variety. For example, in the detection of very weak gravitational radiation by 
a Michelson interferometer, the radiation pressure error needs to be added to the 
photon detection error to form the total error. The application of squeezed states 
in this situation is treated elsewhere in this book and would not be discussed. 
Here, the general theory would be illustrated only with a measurement medium 
or channel that can be characterized as a free boson field, so that the results in 
section ^ may be utilized. The general approach, however, is applicable to any 
specific measurement system. 

Consider the problem of estimating a parameter U with Gaussian density 
Pciu) of zero mean and variance cr^ via a single mode optical field of average 
energy S. While the optimization of (0) yields TCS as the solution, two choices 
have already been fixed in advance: the parameter u is to be modulated into 
the mean ai of the state, and homodyning or measurement of ai is to be 
performed. If one relaxes these two conditions in accordance with the general 
quantum communication approach of section |3.l| , one may pick a state p{u) 
subject to 

/ dupQ{u)tr p{u)a) a < S (51) 
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and a general measurement represented by the POM 0{y), so that the mean- 
square error between y and u is minimized. It is not clear at all that the 
combination of linear modulation, TCS and homodyne is the optimum solution 
or yields a near optimum performance to this problem. As the following devel- 
opment shows, a lower bound for the root-mean-square error Su = (e^)^/^ under 
TTl) can be derived 

^«>.(^:f^-^, (52) 

which is very close to the TCS linear modulation performance, 

Partly because it is not even clear whether the lower bound ( ^2| ) can indeed 
be achieved, one would ordinarily be quite satisfied with the difference between 
1/2 and and stop looking for another system unless the TCS system is not 
practical for whatever reason. One may say the linear TCS system is essentially 
optimum. The corresponding coherent state performance is Su' ^ a j^fS. 

For a uniformly distributed phase parameter G [^ti", tt), the corresponding 
lower bound for the root-mean-square error is 

^^-A^^^^A, (54) 

where A ~ 1.35. This single-mode XjS behavior, improved over the \|^fS 
dependence for coherent state, has been obtained previously for two different 
concrete systems utilizing TCS [60,61] and number states [62]. This should not 
be surprising given the closeness between the number state and TCS capaci- 
ties, (ill)- (11). 

In the case of a narrowband field with m = Z)/2 = WT modes but the same 
total energy or photon number 5", the lower bound for the measurement of a 
Gaussian U is 

Su > a— -J- — (55) 

^ (l + ^)"^e-^, (56) 



For the uniform phase 50 is given by (55) with a replaced by A as in ( p2D 
and (|5^). Note that goes to zero as D co. This is because infinite 
capacity is obtained when the narrowband expression is extrapolated to infinite 
bandwidth, which is not physically meaningful as the /i/ dependence becomes 
important [21,37]. The capacity is finite when such dependency is taken into 
account [28,37-38]. 

The result ( |55| ) or (|5^) is rather remarkable. For the same total energy S 
spread over a large number of modes, the performance can be improved from 
l/S* to assuming the bound can be approached. As a communication limit 
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for which one can control the modulation, this is certainly the case. Intuitively, 
it can be traced to the relative importance of D or over P or S' as discussed 



in sections 2.1 and 3.2, As the quantum state affects only the SNR and not D, 



one may expect such behavior for coherent states also, which is indeed born out 



to be the case as developed in section 4.3. Even in the measurement situation. 



the improvement of a fixed energy spread over many modes is a real one, to 
be demonstrated for a concrete frequency modulation scheme also developed in 



section 4.3 



(An aside between multimode and single mode results. It is sometimes 
said that two-mode squeezing is basically different from single mode squeez- 
ing because two modes are needed. However, by a simple modal transformation 
equivalent to removing cross terms in the multimode hamiltonian, two-mode 
squeezing can be reduced to single mode squeezing, that is, in the multimode 
situation one picks the right mode that yields squeezing. The general theory 
is given in [63,64], which indeed was what led to the prediction of squeezing in 
degenerate four- wave mixing [65].) 

4.2 Classical rate-distortion limit 

The rate-distortion function R{d) of a random variable U was introduced by 
Shannon [11], with by now a very extensive literature. Here we consider just 
continuous U with density function p{u) although discrete U works the same. 
For a distortion measure d{u, v) between u and v, such as \u — v\'^ or |tt — the 
average distortion is 

E[d{U,V)]= J d{u,v)p{u)p{v\u)dudv. (57) 

The rate distortion function R{d) of u is defined to be the minimum mutual 
information 

R(d) = min /([/; V) (58) 

E[d{Uy)]<d 

over all possible choices oip{v\u) subject to the constraint that E[d{U, V)] is less 
than or equal to a given level d > 0. One may think of y as a data-compressed 
version of U — V represents U with an average distortion d, thus it takes 
less bits to represent V than U for d > 0. Shannon's source coding theorem 
with a fidelity criterion and its converse [7,9,12] state that a source variable U 
can be asymptotically represented with an average distortion d if and only if 
at least R{d) bits per source symbol is provided. Similar to channel coding, 
long sequence encoding and decoding that ensures statistical regularity are in 
general required to achieve such minimum in the asymptotic limit. Nevertheless, 
roughly speaking R{d) is the minimum number of information bits per symbol 
required to represent a source with an average distortion d per symbol. 

The channel capacity C may be written as a function C(/3) where (3 denote 
the resource parameters available, including power and bandwidth, as well as 
other characteristics of the channel such as noise power. Referring to Fig. |l|, the 
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question arises on the minimum distortion d one can obtain for transmitting a 
source variable U over a channel with capacity C(/3). The answer is provided 
by Shannon's joint source-channel coding theorem [7,9,12] in the so-called rate 
distortion limit or rate distortion bound. Recall that roughly speaking, the 
channel coding theorem says that C is the maximum number of information 
bits one can transfer error-free over a channel with parameters (3. By combining 
the source and channel coding theorem, one has C{(3) > R{d) so that, since R 
is a monotone decreasing function of d, 

d>R-^C{fj). (59) 

Intuitively, this works as a lower bound as a consequence of the converse to the 
coding theorem because otherwise one can transmit more than the capacity rate 
or compress smaller than the source rate. The positive coding theorem assures 
that the bound may be approached arbitrarily closely in a communications 
situation. 

Even though the rate distortion bound ( |59| ) is generally achieved only with 
source and channel coding, it is occasionally achieved without any coding or 
nonlinear modulation. Consider the transmission of a zero-mean Gaussian U of 
variance cr^ under with the mean-square error criterion, d{u,v) = \u — up, over 
an additive Gaussian noise channel 

+n (60) 

with noise variance N. Then [7,9,11-13] for U 

Ru{d) - ilog((7Vrf), 0<d<a^ (61) 
= 0, d>a^ 



and 



C(5) = ilog(H-^) (62) 



2 °' N 

under E\X^] < S. The rate distortion limit (Bfl) becomes 



d>a^l + ^)-\ (63) 

If one sends U a.s X over the channel as in ( p3| ) so that S = A^a^^ and use 
the estimate (pih, the resulting mean-square error (ES) is exactly the lower 



limit (63). This shows that for this problem of transmitting a Gaussian pa- 
rameter matched, in per use or per symbol to a Gaussian noise channel, even 
in the full generality of Fig. |^ there is nothing that can do better than linear 
modulation-demodulation! Indeed, no way other than through has ever 

been successfully employed to show the optimality of linear modulation in this 
problem. 
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The rate distortion function of the uniform phase variable 4> is difficult to 
evaluate exactly. However, the Shannon upper and lower bounds [7,11] on R{d) 
for 4> differs only by about 0.3 bit per symbol. Thus the upper bound 

1 

i?^(d)<log^ (64) 

would be used for R^{d). 

There are two complications in the application of the rate disortion bound 
to measurement problems. The first arises from the fact that in a measurement 
one has little or no room for source coding as the parameter U is usually out of 
one's control before modulating onto the physical channel input variable X*^™-*. 
Thus, while ( |59| ) remains a limit, in general there may be no way to approach 
it. One may try to replace R{d) of (|6l| ) by some realistic R{d) obtained with 
whatever one can do to U, but contrary to what is stated in ref [56], it is not 
clear how such R{d) may be evaluated. On the other hand, from experience the 
R{d) for a Gaussian parameter with different encoding criteria vary litle, and 
so the exact form of R{d) is not expected to make any major difference in the 
final result 

The second problem is, I believe, more serious and closer to the heart of 
the matter. It arises because no channel coding may be employed in a typical 
measurement situation. One can similarly try to replace C{(3) by a mutual in- 
formation J(/3) incorporating the realistic limitations and freedom, which again 
seems hard to do. This is an essential connection, however, because the modula- 
tion of U into X represents how one physically couples U into the measurement 
medium in the measurement system. It makes a difference whether the opti- 
cal field couples to U via an interferometer configuration or a source impressing 
configuration, and e.g., whether the frequency or the amplitude is modulated by 
U . In any case, if such a meaningful J{(3) can be obtained, then a measurement 
rate distortion limit can be obtained from J(/3) > R{d) in the form similar to 

(ii 

d>R-^JiP). (65) 

4.3 Ultimate quantum measurement system limit 

By combining the above rate distortion theory with classical capacities replaced 
by quantum capacities, one obtains quantum rate distortion limits for a gen- 
eral quantum system of Fig. ^, taking into account all the freedom of classical 
modulation-demodulation and quantum measurement as well as state selections. 
Note that the uncertainty principles are far from sufficient to determine such 
ultimate limits. In the original form they are merely restrictions on quantum 
states, and even in their extended form [20] they do not account for the many 
freedom represented in Fig. ^. A few more remarks on this may be found in [10] 
and [59]. 



From (61) and the single mode version of (|32|), one finds (B2) using the 



optimum number state capacity. With Ctcs (32), one finds the same 5u^'~^^ 
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as (53) for linear modulation without coding! This is exactly the situation 



around (|2^) and (|63|) pointed out above. For Chet of (|30|), one obtains 
which may be compared to 

obtained in coherent state systems without coding or nonlinear modulation. The 
reason why coding or nonlinear modulation is necessary in the coherent state 
case is that bandwidth expansion, two quadratures in a coherent state versus 
the single real parameter to be estimated, has to be utilized. Thus, apart from 
a gain on Su by a factor of 2, the use of TCS for measurement is essentially 
the same as coding on a coherent state system as far as the performance goes, 
a rather unexpected result. 

For the uniform phase parameter, one finds (|^ from (|3l|) and (|6j), and 
similarly 



^TCS 



JS' ^^^^ 
^, (69) 



Again, the 1/5 behavior can be obtained without coding on TCS or number 
state systems, while coherent state systems without coding yields 

6(j)'^l/Vs. (70) 

In the multimode narrowband situation, one similarly obtains ( |55| ) for the 
optimum case, with m = D/2 — WT, and 

<5w^^^ = a(l + — )-™ (71) 
m 

Su^^ = a(l + -)-™. (72) 
m 



Similarly results for (j) can be written down with a replaced by A as in (52) and 



§. iFrom (71) and (^ 



Su' '''' ^ae''^, Su'^^-^ae-^, D ^ oo (73) 

an exponential decrease in 5 versus 1/5 in the single mode case. Even though 
the narrowband assumption is violated in cx), the exponential improvement 
is real from (^5|) or ([7l|)- ( |7^ ) as D can be very large at optical frequencies. 

To show that multimode system is indeed better for measurement for the 
same total energy, consider the following pulse frequency modulation scheme 

/ 2 Q 

(t, 0) - y ^ sin(ii;o + /3</.)t, < t < T, (74) 



29 



where /? is a known fixed constant and S the total energy in the signal. Clas- 
sically, it is known [8] that in the presence of additive white Gaussian noise, 
the use of ( [7^ ) and corresponding nonlinear demodulation lead to a decrease of 
root-mean-square error 6(j) by a factor ^ ^ compared to the linear modulation 
case, when a threshold constraint involving S, T, /? and the noise variance A^o 
is satisfied which occurs for sufficiently large D or S. If is used in either a 
coherent state-heterodyne or TCS-homodyne systems, one would obtain 

S^^^' ^ ^, 5^^' ^ ^ (75) 



compared to (|6^) and ([70|) . While showing the importance of bandwidth, the 
net gain 1 /m is in itself already significant as D is large. 



5 Position monitoring with contractive states 

As a final application of squeezed states, we discuss the problem of repeatedly 
measuring the position of a free mass for which the state after each measure- 
ment is important as it determines the state at the next measurement instant. 
This feature makes the problem, relevant to gravitational-wave interferometers 
treated elsewhere in this book, quite different from the other ones we have dis- 
cussed so far in this chapter, for which all the information can be extracted 
from the system by one measurement. There is still considerable confusion in 
the literature on the validity of the so-called "standard quantum limit" (SQL) 
on how small the position fiuctuation (AX^(t)) can be obtained in a sequence 
of position measurements, although the issues in principle have been cleared up 
entirely over ten years ago. Perhaps this is partly because some of the following 
clarification never appeared in print. 

The SQL states that [66,67] if a position measurement is made at f = 0, the 
fiuctuation at f > is at least 

{AX\t))sQL = ht/m, (76) 

where m is the mass of a fermion. The derivation of ([76[), however, was incor- 
rectly taken to be universally valid as a consequence of the Uncertainty Principle, 
and it was concluded that the free mass position is not a "QND observable" — 
namely, that the disturbance to the system from the first position measurement 
demolishes the possibility of an accurate second measurement after an interval 
of free evolution. To delineate how a position monitoring scheme works, con- 
sider the monitoring of weak classical forces fi{t),f2{t) coupled linearly to a 
free mass with position X and momentum P, 

Hi = h{t)X + h{t)P. (77) 

In the Heisenberg picture, 

X{t) = X{0) + P{0)t/m+ [ f2{t')dt' - [ dt' [ dTfi{T)/m, (78) 

Jq Jq Jq 
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P(i)=P(0)- / fi{t')dt'- (79) 







Typically, /2 = and X is more readily measureable than P in practice. From 
([tsI), information on fi{t) can be obtained by measurements on X{t) at different 
times. 

If a position measurement at i = is made in the sense of Pauli's first-kind 
measurement [68], the position eigenstates \X) is to be used to compute the 
measurement probability and the state of the mass after the measurement 
with a reading X' is \X'). Thus, first-kind measurement of a selfadjoint opera- 
tor is one for which the Von Neumann projection postulate applies. From ([79|), 
the position fluctuation (AX^(t)) is often concluded to be infinite, because the 
"back-action" causes (A^_P(0)) = oo with (A^^(O)) = from the Uncertainty 
Principle. Since (_P^(0)) = (A^P(O)) + (P(0))^, an infinite average energy is ob- 
tained for the mass in a position eigenstate \X), thus one can actually only make 
"approximate" position measurements which are generally described by POM 
as far as the measurement statistics goes, with (A^^(O)) > 0. In any event, 
it was concluded that whatever position measurement is used the Uncertainty 
Principle implies the SQL ([76|). 

In [69], it was pointed out that this conclusion is not valid from (fz^ ) when 
{AX{0)AP{0)+AP{0)AX{0)) is negative. It was also pointed out that {AX^{t)) 
can be arranged to be as small as desired at any f > if the state after mea- 
surement is left in a "contractive state" Ifiiyaw), which is a TCS Ifj-va) with 
the frequency w put back explicitly and the parameters ^, to chosen appro- 
priately so that the "generalized minimum uncertainty wave packet" {X\iJ,i>auj) 
contracts rather than spreads in i up to a desired measurement time. It was 
observed that measurements of the second kind, in particular a class of mea- 
surements formally described by Gordon and Lonisell, may be used to beat the 
SQL. Specifically, the measurement described by [68,69] 

\^iyauj){^'iy'auj\ (80) 

would work, where [/xVao;) is used to compute the measurement probability 
with reading a = ai + ia2, 

ai ^ x{muj/2hy/^, a2 = p/ {2hmuj)i , (81) 

which may be considered a joint approximate measurement of X and P similar 
to TCS-heterodyne [22], and Ifiiyauj) is the state after measurement of reading 
a arranged to be a contractive state for the next measurement. The position 
measurement would be sharp if {AX'^) ~ \fi' — v']^ 0, while /i, ut are chosen 
so that the mass state has a sharply defined position at the next measurement 
instant. 

Two criticisms were made on the success of this approach to beat the SQL. 
First, it was pointed out that it was not clear a measurement described as in ( pO[ ) 
is realizable in principle. A quantum measurement realization can be described 
by the coupling of a "proble" to the system with commuting selfadjoint opera- 
tors being measured on the probe, and with all the quantities computed by the 
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usual rules of quantum mechanics (without the need for the projection postulate 
as emphasized by Ozawa [16].) While two realizations were produced [71], they 
were criticized on the ground that the probe-system interaction hamiltonians Hi 
are time-dependent and so are equivalent to "state preparation." While these 
realizations are actually quite different from the state preparations that were 
discussed and are in fact full-fledged quantum measurement realizations in ac- 
cordance with standard quantum measurement theory, the situation is resolved 
beyond dispute when a time-independent Hi was found [72] for realizing (^0|). 
More significantly, Ozawa [16,73] has obtained a complete characterization of 
quantum measurement including the state after measurement in the concept of a 
completely positive operation measure mentioned in Section 3.1, and he showed 
that any Gordon-Lonisell measurement representation, in which a complete but 
not necessarily orthogonal set of states is used to yield the measurement statis- 
tics and the state after measurement depends only the measured value, is indeed 
realizable. 

To discuss the second criticism, one needs to examine more closely how the 
measurement scheme based on ( |80| ) actually works. Let a' be the reading at 
t = so that the state at i = 0+ is [/ii^a'cj). After another time i, the free mass 
is in state \^tVta'tUj) with ~ Vt\ ^0. From (]7^)-(|79|) with /2 = 0, the value 

is given by 



a[i — a'l+a^t/ni— / dt' / dTfi{T)/m 











a. 



fi{r)dT. 



(82) 
(83) 



Equations (^ and ( |83[ ) provide the average of the reading a at i, which can 
be represented by 



a'l + a'2t/m 
a't2 + "2, 



t ^t' 
dt' / dr fi[T)/m + ni, 
Jo 



(84) 
(85) 



where the fluctuation of rii is vanishingly small from |/zt — i^t| ^ while the 
noise n2 is big. From (Q), one may use the reading to estimate /i after it 
is subtracted from the value of a'l + a'2t/m known at time t. The reading is 
also taken so that it could be used for the subtraction at the next measurement, 
although it is not used for estimating /i as it is noisy and helps little. It is 
clear that this scheme beats the SQL to any arbitrary level in a sequence of 
measurements . 

In [74], a "predictive sense" of the SQL was proposed to suggest that the 
SQL was not beaten in that sense. This predictive sense can be described by 
the stipulation that prior to any measurement, a'^ and a'2 in ( |8^ ) and ( ^5| ) are 
unknown and random, thus is also more random than rii and indeed obeys 
the SQL. But since we know we will have the reading value a' available at t 
which would be subtracted from (pj), the reading a at t, we can indeed predict 
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we will get {AX'^(Q)), {AJf^it)), and so on, arbitrarily small. Thus, the SQL 
is beaten by (|8^) in the predictive sense. Further eludication of this point and 
discussion on the working of this scheme (p0|)-(p5|) was provided in [75]. 
Actually, this issue would not even arise if the measurement 

\^j.iyOuj){fj,'iy'auj\ (86) 

is employed instead of (^o|), for which the state after measurement always has 
(X) = (P) = 0. This measurement is a special degenerate case of Gordon- 
Lonisell measurement, and thus realizable by Ozawa's theorem. Indeed, an 
explicit hamiltonian realization can be developed for (^) [76,77]. 

Since the positions of a free mass can be repeatedly measured accurately, 
it is not appropriate to say that X is not a QND observable. The term QND 
measurement is often used just to refer to a first-kind measurement, which is an 
acceptable terminology. What has never been demonstrated is that there is, in 
principle, any observable which is not a QND observable in the generic sense. In 
fact, it should be clear from the development in this section, and it can indeed 
be readily shown in principle, that any observable can be repeatedly measured 
arbitrarily accurately in the absence of particular constraints. The key point 
is that, as in (pO|), the state used to compute the measurement probability and 
the state after measurement need not be the same. 
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